EN FR
EN FR


Section: New Results

Data and Knowledge Integration

Participants : Julio Cesar Dos Reis, Fayçal Hamdi, Rania Khefifi, Yassine Mrabet, Nathalie Pernelle, Chantal Reynaud, Fatiha Saïs, Brigitte Safar, Fabian Suchanek, Danai Symeonidou.

Reference Reconciliation

The reference reconciliation problem consists in deciding whether different data descriptions refer to the same real world entity (same person, same conference etc.) Some of existing approaches, such as LN2R, are declarative and knowledge-based. Different kinds of knowledge can be declared in a domain ontology, like disjointness between classes or key constraints. This knowledge can be exploited to infer reconciliation and non-reconcilliation decisions.

Our reference reconciliation work pursues three directions:

  • develop an automatic approach of key constraint discovery. We have proposed in [46] KD2R, a method which allows automatic discovery of key constraints associated to OWL2 classes. These keys are discovered from RDF data which can be incomplete. The proposed algorithm allows this discovery without having to scan all the data. KD2R has been tested on data sets of the international contest OAEI and obtains promising results.

  • develop a reference reconciliation method for detecting redundant data in case of web data tables that are semantically annotated by an ontology. Each table cell values consists in numerical fuzzy set (NFS) or in symbolic fuzzy set (SFS). We have developed a method which uses ontology knowledge and computes similarity scores to decide the data redundancy. We have also proposed two similarity measures for numerical fuzzy set as well as symbolic fuzzy set. The proposed measures are more flexible than existing ones. This approach has been published in [36] , [58] . We are working on its extension to be able to distinguish redundant data from similar ones by using provenance information.

  • develop a new approach which addresses the problem of resource discovery in the Linked Open Data cloud (LOD) where data described by different schemas is not always linked. We have proposed an automatic approach in [42] , [58] that allows discovery of new links between data. These links can help to match schemas that are conceptually relevant with respect to a given application domain. Furthermore, these links can be exploited during the querying process in order to combine data coming from different sources. In this approach we exploit the semantic knowledge declared in different schemas in order to model: (i) the influences between concept similarities, (ii) the influences between data similarities, and (iii) the influences between data and concept similarities. The similarity scores are computed by an iterative resolution of two non linear equation systems that express the concept similarity computation and the data similarity computation.

Context-aware Personal Information Management

Personal information management (PIM) is the practice and analysis of the activities performed by people to acquire, organize, maintain, and retrieve information for everyday use. PIM is a growing area of interest beacause, everyone is looking for better use of our limited personal resources of time, money and energy. Several research on the topic is being done in different disciplines, including human-computer interaction, database management, information retrieval and artifcial intelligence.

The increasingly big amount of personal information (e.g., mails, contacts, appointments) managed by a user is characterized by their heterogeneity, their dispersion and their redundancy. The general goal of this work consists in designing a system, which allows providing the end-users personal data access with services that are relevant to his/her needs, and to access personal data both by mobile devices (smartphone) and Internet-connected Personal Computers. More specifcally, we focus here on the problem of defining a common meta-model for a flexible and homogeneous personal information management. The meta-model that we propose allows users creating personal information and organizing them according to different points of view (ontologies) and different contexts. Contextual queries are defined to allow users to retrieve its personal information using the geographical contexts. The semantic Web languages (OWL, RDF and SPARQL) are used to implement the approach.

Mapping between ontologies

We pursue our work on ontology alignment in the setting of the ANR GeOnto project by aiming to provide full life-cycle support for ontologies.

We investigated how alignment results generated by our alignment tool, TaxoMap, can be used to enrich one ontology with another. We shown that the enrichment process depends on characteristics of the ontology used for enrichment. Three enrichment contexts identified in the setting of the ANR project GeOnto have been studied and enrichment treatments performed. A first context considers ontologies of the same application domain and of a reasonable size. A second context considers small ontologies previously extracted from a generalist one. A third context considers enrichment from a huge, generalist ontology, such as Yago. Early results obtained in the setting of the ANR project GeOnto in the topographic domain have been published in  [50] , [25] .

The module supporting our enrichment approach has been implemented in TaxoMap Framework using patterns. Initially, TaxoMap Framework was composed of our alignment tool, TaxoMap, we are working on for several years in the team and of a mapping refinement module. We extended it in order to obtain a broader framework and an interactive environment by including TaxoPart, a partitioning tool we developed to split two huge ontologies which could not be aligned into two sets of blocks of a limited size, and a module specific to ontology enrichment. Moreover, we re-implemented TaxoMap, our alignment tool, as a web service to make it easily accessible at: http://taxomap.lri.fr:8000/axis2/services/TaxoMapService?wsdl .

We also started a PhD work, joined with CRP Henri Tudor in Luxembourg, to investigate issues dealing with medical knowledge organizing systems evolution. We will define a formal framework to support medical knowledge organizing systems evolution in a consistent way and also to support the maintenance of mappings directly impacted by knowledge organizing systems local evolution.

On a related topic, we have developed a probabilistic framework, PARIS (Probabilistic Alignment of Relations, Instances and Schema), for matching ontologies holistically, thereby exploiting synergies between matches on the instance level and matches on the schema level [57] . The framework is parameter-free and does not require resource-specific tuning. PARIS is fully implemented and has been shown to match some of the largest ontologies on the Semantic Web with a precision of around 90%.

Integration of Web resources

We have pursued our work on integration of resources available on the Web in Adaptive Hypermedia Systems (AHS), allowing creators to define their own adaptation strategies based on their own domain models.

The approach is based on a set of 22 adaptation patterns, independent of any application domain and independent of any adaptation engine, published in [59] , [47] . These elementary adaptation patterns are organized in a typology in order to facilitate their understanding and their use in the EAP framework to define complex strategies. In [24] , we described the whole process to generate complex adaptation strategies and how the generated strategies can be integrated into existing AHSs. The results of an experiment conducted in the e-learning domain is presented. It showed that the pattern-based approach for defining adaptation strategies is more suitable than those based on "traditional" AH languages.

We also pursued our work on the integration of the EAP framework and other AHSs. Our collaboration with A. Cristea from the University of Warwick (UK) led us to a very detailed study of adaptation languages. The first flexible generic adaptation language is the LAG adaptation language. We studied the expressivity of this initial adaptation language in comparison with our newly proposed language, in the EAP framework, and the pros and cons of various decisions in terms of the ideal way of defining an adaptation language. We proposed a unified vision of adaptation and adaptation language. The unified vision is not limited to the two languages analyzed, and can be used to compare and extend other approaches in the future. Beside this theoretical qualitative study, we also made experimental evaluation and comparison of the two languages, and an article is currently being evaluated.

We have also investigated integration of Web services. The Search Computing project (“SeCo”) at the Polytechnic University of Milan aims to orchestrate Web services to answer user queries. Currently, the project represents Web services by so-called Service Marts. These are frame-like representations of the services, which follow the slot-value paradigm. This representation faces several challenges if more Web services get added to the system, because it is hard to ensure that Web services added by different users can still be joined. Therefore, we have explored a more ontological representation of Web services. In our proposal [55] , Web services are represented as sub-graphs of an ontology. This allows users to add new Web services that re-use the vocabulary of existing Web services.

On related topics, together with researchers from the Max-Planck Institute in Saarbrucken, we have worked on extending the YAGO ontology. YAGO already contains dozens of millions of facts. With the present work, we aim to give these facts a temporal and a spatial dimension. For every event and every entity, we want to know where and when these objects existed. For this purpose, we have developed a methodology that extracts these types of facts from Wikipedia. We have also developed a logical reasoning framework that allows propagating these time and space annotations from some facts to others. This has grown YAGO to 80 million facts in total, making it an ontology that is anchored in time and space (Best demo award at the WWW 2011 conference [40] ).